Translation as Annotation
نویسندگان
چکیده
In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotations can be automatically derived from aligned translated texts. We will show that translations can be exploited in various interesting ways to speed up and automate the linguistic annotation of texts. If none of the texts is already annotated, information from aligned texts can be exploited to carry out the annotation from scratch. On the contrary, if the texts in one language have been annotated and the others have not, annotations can be transferred from one language to the other. The transferbased method allows for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new languages with highly reduced human effort.
منابع مشابه
Symmetric Statistical Translation Models for Automatic Image Annotation
Automatic image annotation provides means for users to search image collections on the semantic level using natural language queries. In the past, statistical machine translation models have been successfully applied to automatic image annotation. A problem with this approach is that, due to the skewed distribution of term frequency for annotation words, common words have been overly favored, w...
متن کاملTags Re-ranking Using Multi-level Features in Automatic Image Annotation
Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملAnnotation in Architecture: A Systematic Approach toward Mobilization and Development of Theoretical, Research, and Critical Basis in Architecture
Annotations usually refer to marginal notes that explain a difficult or ambiguous subject, provide a general definition or a critical remark for a particular part of a text. Historically, annotating was a well-known tradition in Islamic sciences and was used especially in times when there were less new potentials for generating new knowledge. The main question of this research is, can the tradi...
متن کاملMetaMorpho TM: A Rule-Based Translation Corpus
This paper discusses the aspects of bi-lingual resource processing within a rule-based translation memory (TM) system currently being developed. Translation memories can be viewed as translation tools incorporating parallel corpora, mainly aligned at the sentence level. Usually, these corpora have no linguistic annotation, as commercial TM systems perform queries at the character level, using f...
متن کاملWord Alignment Annotation in a Japanese-Chinese Parallel Corpus
Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word alignment is of significance to provide gold-standard for developing and evaluating both example-based machine translation model and statistical machine translation model. This paper presents the work...
متن کامل